In 2021, the Charlotte Area Transit System (CATS) embarked on ‘The Bus Priority Study’ as part of the city’s larger Strategic Mobility Plan, aiming to enhance bus trips’ speed, reliability, and convenience for riders. To achieve this, CATS set out to understand recent dynamics, explore accessibility, and examine network efficiencies.
Our team will support CATS transportation planners in improving bus line efficiency by analyzing historical ridership trends per bus stop to identify and predict underperforming stops and routes.
With this, we will propose alternate network options through a web-based dashboard that presents predictions with the context of demographic, spatial, financial factors of influence.
To analyze and forecast performance trends, we have used ridership data from CATS for the years between 2017 and 2023. Additionally we have identified potential demographic, spatial, and economic parameters that may influence our predictor variable that is ridership.
On initial review, certain limitations of the data present themselves. These include: 1. No hourly ridership data is available precluding us from analyzing peak hour trends 2. Weekend vs weekday ridership is not available which leads to the assumption of similar usage patterns of weekends, weekdays, and holidays 3. Data on stops added or removed in these years of study is not explicit 4. Only stop locations are made available which means route locations will have to be sourced and spatially joined. This allows us to join only the most recent route data for these stops. 5. The dataset for 2023 is incomplete and only includes data for months January to November.
ggplot(busstop_sf) +
geom_sf(data = allTracts_sf, fill = "grey90", alpha = 0.8 , color = NA)+
geom_sf(aes(color = Board), size = .5) +
scale_color_gradient(low = "blue", high = "red") +
theme_minimal() +
labs(title = "Bus Stops", color = "Board Value")
ggplot(busstop_sf) +
geom_sf(data = allTracts_sf, fill = "grey90", alpha = 0.8 , color = NA)+
geom_sf(aes(color = Alight), size = .5) +
scale_color_gradient(low = "blue", high = "red") +
theme_minimal() +
labs(title = "Bus Stops", color = "Alight Value")
To investigate the nature of performance before and after pandemic years further, the difference between average ridership is mapped pre Covid (2017- 2020/3) and post Covid (2020/3 - 2023). This mapping exercise indicates that ridership has decreased in most bus stops post the pandemic. Additionally, some stops show a hundred percent decline. These are filtered out because it may indicate that the stop is no longer in service.
Next, stops with an increase in ridership are separated from stops with a decrease and the results are mapped. This suggests that a majority of stops in the city center have experienced a decrease in ridership. This spatial clustering of underperforming stops suggests areas for targeted intervention.
palettebase5_rev <- rev(palettebase5)
ggplot() +
geom_sf(data = dat2020, fill = "grey90", alpha = 0.5 , color = NA)+
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = stops_sf , aes(colour = q5(Difference)),
show.legend = "point", size = 1, alpha= 0.5) +
scale_colour_manual(values = palettebase5_rev,
labels=qBr(stops_sf ,"Difference"),
name="Quintile\nBreaks (%)") +
labs(title = paste("Difference (%) - Month:", selected_month),
fill = "Difference (Post - Pre)") +
mapTheme()
filtered_stops_sf <- stops_sf[stops_sf$Difference != -100, ]
ggplot() +
geom_sf(data = dat2020, fill = "grey90", alpha = 0.5 , color = NA)+
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = filtered_stops_sf , aes(colour = q5(Difference)),
show.legend = "point", size = 1, alpha= 0.5) +
scale_colour_manual(values = palettebase5_rev,
labels=qBr(filtered_stops_sf ,"Difference"),
name="Quintile\nBreaks (%)") +
labs(title = paste("Difference (%) (-100% excluded) - Month:", selected_month),
fill = "Difference (Post - Pre)") +
mapTheme()
stops_sf$PositiveDifference <- stops_sf$Difference > 0
positive_stops_sf <- stops_sf[stops_sf$PositiveDifference,]
negative_stops_sf <- stops_sf[!stops_sf$PositiveDifference,]
yellow_paletteaccent_rev <- rev(yellow_paletteaccent)
# positive and negative differences
positive_plot <- ggplot(positive_stops_sf, aes(colour = q5(Difference))) +
geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(size = 1) +
scale_color_manual(values = red_paletteaccent,
labels = qBr(positive_stops_sf, "Difference"),
name = "Quintile\nBreaks (%)") +
labs(title = paste("Positive Increase (%) - Month:", selected_month),
fill = "Positive Difference (Post - Pre)") +
mapTheme()
negative_plot <- ggplot(negative_stops_sf, aes(colour = q5(Difference))) +
geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(size = 1) +
scale_color_manual(values = yellow_paletteaccent_rev,
labels = qBr(negative_stops_sf, "Difference"),
name = "Quintile\nBreaks (%)") +
labs(title = paste("Negative Decrease (%) - Month:", selected_month),
fill = "Negative Difference (Post - Pre)") +
mapTheme()
combined_plot <- plot_grid(positive_plot, negative_plot, ncol = 2)
print(combined_plot)
# Exclude -100%
filtered_negative_stops_sf <- negative_stops_sf[negative_stops_sf$Difference != -100, ]
negative_plot <- ggplot(filtered_negative_stops_sf, aes(colour = q5(Difference))) +
geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(size = 1) +
scale_color_manual(values = yellow_paletteaccent_rev,
labels = qBr(filtered_negative_stops_sf, "Difference"),
name = "Quintile\nBreaks (%)") +
labs(title = paste("Negative Decrease (%) (-100% excluded) - Month:", selected_month),
fill = "Negative Difference (Post - Pre)") +
mapTheme()
combined_plot <- plot_grid(positive_plot, negative_plot, ncol = 2)
print(combined_plot)
Between 2017-2023, there is a uniform trend of underperformance in certain underperforming stops. Other underperforming stops exhibit a growing trend of underperformance post the pandemic. About half of these stops exhibit spatial clustering patterns and are located along a single route.
# filter underperforming stops
underperforming_stops <- stops_data[(stops_data$Board + stops_data$Alight) <= 10, ]
underperformance_counts <- table(underperforming_stops$Stop_ID)
top_50_stops <- names(sort(underperformance_counts, decreasing = TRUE)[1:50])
underperforming_top_50 <- underperforming_stops[underperforming_stops$Stop_ID %in% top_50_stops, ]
underperforming_top_50 <- underperforming_top_50 %>%
mutate(Total_Board_Alight = Board + Alight)
# Order
underperforming_top_50$Stop_ID <- factor(underperforming_top_50$Stop_ID, levels = rev(top_50_stops))
stacked_bar_plot <- ggplot(underperforming_top_50, aes(x = Stop_ID, fill = factor(Year))) +
geom_bar(stat = "count", position = "stack", color = "darkgrey" ) +
labs(title = "Top 50 Bus Stops with Highest Underperformance Counts",
x = "Bus Stop Name",
y = "Number of Times Underperformed",
fill = "Year") +
scale_fill_manual(values = palettebase7) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) # Rotate x-axis labels
## print(stacked_bar_plot)
underperforming_top_50_all_sf <- st_as_sf(underperforming_top_50, coords = c("Longitude", "Latitude"), crs = 4326)
underperforming_top_50_all_sf <- st_transform(underperforming_top_50_all_sf, crs = 3358) # NAD83 North Carolina
map_plot_top_50 <- ggplot() +
geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = underperforming_top_50_all_sf, aes(colour = q5(Total_Board_Alight)), size = 2) +
scale_color_manual(values = rev(palettebase5),
labels = qBr(underperforming_top_50_all_sf, "Total_Board_Alight"),
name = "Quintile\nBreaks") +
labs(title = "Top 50 Bus Stops with Highest Underperformance Counts (2017 - 2023)",
fill = "Avg_Board + Avg_Alight") +
mapTheme()
## print(map_plot_top_50)
combined_plot <- stacked_bar_plot + map_plot_top_50
print(combined_plot)
The underperformance and spatial clustering trends in 2023 are consistent with the overall trend.
# Underperforming Stops in 2022
underperforming_stops_2022 <- stops_data[(stops_data$Board + stops_data$Alight) <= 10 & stops_data$Year == 2022, ]
underperformance_counts_2022 <- table(underperforming_stops_2022$Stop_ID)
# Top 30 Stops for 2023
top_50_stops_2022 <- names(sort(underperformance_counts_2022, decreasing = TRUE)[1:50])
underperforming_top_50_2022 <- underperforming_stops_2022[underperforming_stops_2022$Stop_ID %in% top_50_stops_2022, ]
# Order
underperforming_top_50_2022$Stop_ID <- factor(underperforming_top_50_2022$Stop_ID, levels = rev(top_50_stops_2022))
bar_plot_top_50_2022 <- ggplot(underperforming_top_50_2022, aes(x = Stop_ID, fill = "2022")) +
geom_bar(stat = "count", color = "black") +
labs(title = "Top 50 Bus Stops with Highest Underperformance Counts",
x = "Bus Stop Name",
y = "Number of Times Underperformed",
fill = "Year") +
scale_fill_manual(values = c("2022" = "#7ACFD3"), name = "Year") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
##print(bar_plot_top_50_2022)
underperforming_top_50_2022 <- underperforming_top_50_2022 %>%
mutate(Total_Board_Alight = Board + Alight)
underperforming_top_50_sf <- st_as_sf(underperforming_top_50_2022, coords = c("Longitude", "Latitude"), crs = 4326)
underperforming_top_50_sf <- st_transform(underperforming_top_50_sf, crs = 3358) # NAD83 North Carolina
map_plot_top_50_2022 <- ggplot() +
geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = underperforming_top_50_sf, aes(colour = q5(Total_Board_Alight)), size = 2) +
scale_color_manual(values = rev(palettebase5),
labels = qBr(underperforming_top_50_sf, "Total_Board_Alight"),
name = "Quintile\nBreaks") +
labs(title = "Top 50 Bus Stops with Highest Underperformance Counts in 2022",
fill = "Total_Board_Alight") +
mapTheme()
##print(map_plot_top_50_2022)
combined_plot <- bar_plot_top_50_2022 + map_plot_top_50_2022
print(combined_plot)
##print(map_plot_top_50)
##print(map_plot_top_50_2022)
##print(stacked_bar_plot)
##print(bar_plot_top_50_2022)
Other existing and proposed transit stops and routes are considered to study the relationship between infrastructure and underperforming stops, if any. If significant, the distance of bus stops to other transit stops would also serve as variables in the model.
transit <- ggplot() +
geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, aes(color = "Bus Route"), size = 5, alpha = 0.5) +
geom_sf(data = blue_routes, aes(color = "Blue Line"), size = 8) +
geom_sf(data = blue_stops, aes(color = "Blue Line"), size = 2) +
geom_sf(data = gold_routes, aes(color = "Gold Line"), size = 8) +
geom_sf(data = gold_stops, aes(color = "Gold Line"), size = 2) +
geom_sf(data = silver_routes, aes(color = "Silver Line"), size = 8, linetype = "dashed") +
geom_sf(data = silver_stops, aes(color = "Silver Line"), size = 2, shape = 21) +
geom_sf(data = red_routes, aes(color = "Red Line"), size = 8, linetype = "dashed") +
geom_sf(data = red_stops, aes(color = "Red Line"), size = 2, shape = 21) +
labs(title = "Existing and Proposed Transit Lines and Stops") +
scale_color_manual(name = "Lines",
values = c("Bus Route" = "darkgrey",
"Blue Line" = "#4682B4",
"Gold Line" = "#FFD700",
"Silver Line" = "purple",
"Red Line" = "#FF6347"),
labels = c("Blue Line","Bus Routes", "Gold Line", "Proposed Red Line", "Proposed Silver Line")) +
mapTheme()
print(transit)
Most amenities are clustered within the city center and expanded to the south, presenting an inequitable distribution. For model input, the distance to the amenities would serve as variables as well
A <- ggplot() + geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = school, color = "darkgrey", size = .5, alpha = 0.3) +
stat_density2d(data = data.frame(st_coordinates(school.sf)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradient(low = "grey", high = "#FFE6A7", name = "Density") +
scale_alpha(range = c(0.00, 0.1), guide = "none") +
labs(title = "Density of Schools") +
mapTheme()
B <- ggplot() + geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = parks, color = "darkgrey", size = .5, alpha = 0.5) +
stat_density2d(data = data.frame(st_coordinates(parks.sf)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradient(low = "grey", high = "#7ACFD3", name = "Density") +
scale_alpha(range = c(0.00, 0.1), guide = "none") +
labs(title = "Density of Parks") +
mapTheme()
C <- ggplot() + geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = groceries, color = "darkgrey", size = .5, alpha = 0.5) +
stat_density2d(data = data.frame(st_coordinates(groceries.sf)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradient(low = "grey", high = "#FFAB00", name = "Density") +
scale_alpha(range = c(0.00, 0.1), guide = "none") +
labs(title = "Density of Grocery Stores") +
mapTheme()
D <- ggplot() + geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = policeoffice, color = "darkgrey", size = .5, alpha = 0.5) +
stat_density2d(data = data.frame(st_coordinates(policeoffice.sf)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradient(low = "grey", high = "#0077B6", name = "Density") +
scale_alpha(range = c(0.00, 0.1), guide = "none") +
labs(title = "Density of Police Offices") +
mapTheme()
E <- ggplot() + geom_sf(data = dat2020, fill = "grey90", alpha = 0.5, color = NA) +
geom_sf(data = bus_routes, color = "darkgrey", size = 5, alpha = 0.5) +
geom_sf(data = shoppingcen, color = "darkgrey", size = .5, alpha = 0.5) +
stat_density2d(data = data.frame(st_coordinates(shoppingcen.sf)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradient(low = "grey", high = "#FF6662", name = "Density") +
scale_alpha(range = c(0.00, 0.1), guide = "none") +
labs(title = "Density of Shopping Centers") +
mapTheme()
combined_maps <- A + B + C + D + E
#combined_maps <- combined_maps / plot_layout(nrow = 2, ncol = 3)
print(combined_maps)
Almost all census tracts with a higher percentage of population that commute by bus to work are located close to bus stops. However, some suburban census tracts with a high percentage of population that commute by bus are far from bus stops. This may be considered as a variable that influences prediction.
a <- ggplot(allTracts, aes(x = as.factor(year), y = TotalPop, fill = as.factor(year))) +
geom_bar(stat = "identity", color = "transparent") +
labs(title = "Total Population: 2017-2022", x = "Year", y = "Total Population") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 6)) +
scale_fill_manual(values = palettebase)
b <- ggplot(allTracts, aes(x = as.factor(year), y = TotalWorker, fill = as.factor(year))) +
geom_bar(stat = "identity", color = "transparent") +
labs(title = "Population of Worker: 2017-2022", x = "Year", y = "Total Population") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 6)) +
scale_fill_manual(values = palettebase)
c <- ggplot(allTracts, aes(x = as.factor(year), y = Workfromhome, fill = as.factor(year))) +
geom_bar(stat = "identity", color = "transparent") +
labs(title = "Population of Work from Home: 2017-2022", x = "Year", y = "Total Population") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 6)) +
scale_fill_manual(values = palettebase)
d <- ggplot(allTracts, aes(x = as.factor(year), y = Workwithbus, fill = as.factor(year))) +
geom_bar(stat = "identity", color = "transparent") +
labs(title = "Population of People Using Bus as Transportation to Work: 2017-2022", x = "Year", y = "Total Population") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 6)) +
scale_fill_manual(values = palettebase)
combined_plot <- grid.arrange(a,b,c,d, ncol = 4, top = c(2, 2))
print(combined_plot)
## TableGrob (1 x 4) "arrange": 4 grobs
## z cells name grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## 3 3 (1-1,3-3) arrange gtable[layout]
## 4 4 (1-1,4-4) arrange gtable[layout]
The majority of Mecklenburg County population resides in suburban census tract, a trend that is consistent with the number of employed individuals in the county who also live in the suburbs. However, when examining census data on the use of public transportation to commute to work, a significant proportion is concentrated in the city center and expanded to the south. This can be correlated with census data on the prevalence of working from home, which tends to be dispersed in suburban areas situated relatively farther away from the city center
average_pct_cars <- allTracts %>%
group_by(year) %>%
summarise(avg_pct_0cars = mean(pctnocar),
avg_pct_1cars = mean(pct1car),
avg_pct_2cars = mean(pct2car),
avg_pct_3cars = mean(pct3car),
avg_pct_4cars = mean(pct4car),
avg_pct_5ormore_cars = mean(pct5car))
average_pct_cars_long <- pivot_longer(average_pct_cars,
cols = starts_with("avg_pct"),
names_to = "Cars",
values_to = "Percentage")
plot_list <- list()
for (i in unique(average_pct_cars_long$Cars)) {
num_cars <- gsub("avg_pct_", "", i)
num_cars <- gsub("cars", "car", num_cars)
subset_data <- subset(average_pct_cars_long, Cars == i)
plot <- ggplot(subset_data, aes(x = year, y = Percentage, group = 1)) +
geom_line(color = "black",linewidth=1.5) +
geom_point(color = "red",size=3) +
labs(x = "Year",
y = "Average Percentage of Households",
title = paste("Trend of Car Ownership:", num_cars)) +
scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
theme_minimal()
plot_list[[i]] <- plot
}
grid.arrange(grobs = plot_list, ncol = 2, top = "Trend of Car Ownership (2017-2022)")
1.To identify most significant demographic and spatial variables for predicting underperformance 2. To present findings to clients and solicit feedback 3. To arrive at best modeling process to predict stops that are likely to underperform 4. To propose removal of and addition of stops to certain routes and examine their equity and financial impact 5. To create a web-based dashboard that allows CATS transportation planners to overlay socioeconomic, equity, and financial factors with underperformance to be able to allocate resources to stops effectively
In doing so, our aim is to build a comprehensive and robust model to optimize bus line efficiency, aligning with CATS’ goal of providing faster, more reliable, and convenient transit options for Charlotte residents.